Overview

Dataset statistics

Number of variables44
Number of observations8161
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory1.1 MiB
Average record size in memory135.0 B

Variable types

BOOL32
NUM12

Reproduction

Analysis started2021-12-10 00:59:02.395690
Analysis finished2021-12-10 00:59:33.949141
Versionpandas-profiling v2.6.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
PARENT1_Yes is highly correlated with PARENT1_NoHigh Correlation
PARENT1_No is highly correlated with PARENT1_YesHigh Correlation
MSTATUS_Yes is highly correlated with MSTATUS_NoHigh Correlation
MSTATUS_No is highly correlated with MSTATUS_YesHigh Correlation
SEX_M is highly correlated with SEX_FHigh Correlation
SEX_F is highly correlated with SEX_MHigh Correlation
CAR_USE_Private is highly correlated with CAR_USE_CommercialHigh Correlation
CAR_USE_Commercial is highly correlated with CAR_USE_PrivateHigh Correlation
REVOKED_Yes is highly correlated with REVOKED_NoHigh Correlation
REVOKED_No is highly correlated with REVOKED_YesHigh Correlation
URBANICITY_Highly Urban/ Urban is highly correlated with URBANICITY_Highly Rural/ RuralHigh Correlation
URBANICITY_Highly Rural/ Rural is highly correlated with URBANICITY_Highly Urban/ UrbanHigh Correlation
KIDSDRIV has 7180 (88.0%) zeros Zeros
HOMEKIDS has 5289 (64.8%) zeros Zeros
YOJ has 625 (7.7%) zeros Zeros
OLDCLAIM has 5009 (61.4%) zeros Zeros
CLM_FREQ has 5009 (61.4%) zeros Zeros
MVR_PTS has 3712 (45.5%) zeros Zeros
LOG_INCOME has 615 (7.5%) zeros Zeros

Variables

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size63.9 KiB
0
6008
1
2153
ValueCountFrequency (%) 
0 6008 73.6%
 
1 2153 26.4%
 

KIDSDRIV
Real number (ℝ≥0)

ZEROS
Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.17105746844749417
Minimum0
Maximum4
Zeros7180
Zeros (%)88.0%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile1
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5115340939
Coefficient of variation (CV)2.99042245
Kurtosis11.79177272
Mean0.1710574684
Median Absolute Deviation (MAD)0.3009907177
Skewness3.353069928
Sum1396
Variance0.2616671292
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 7180 88.0%
 
1 636 7.8%
 
2 279 3.4%
 
3 62 0.8%
 
4 4 < 0.1%
 
ValueCountFrequency (%) 
0 7180 88.0%
 
1 636 7.8%
 
2 279 3.4%
 
3 62 0.8%
 
4 4 < 0.1%
 
ValueCountFrequency (%) 
4 4 < 0.1%
 
3 62 0.8%
 
2 279 3.4%
 
1 636 7.8%
 
0 7180 88.0%
 

AGE
Real number (ℝ≥0)

Distinct count60
Unique (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.790466854552136
Minimum16.0
Maximum81.0
Zeros0
Zeros (%)0.0%
Memory size63.9 KiB

Quantile statistics

Minimum16
5-th percentile30
Q139
median45
Q351
95-th percentile59
Maximum81
Range65
Interquartile range (IQR)12

Descriptive statistics

Standard deviation8.624418837
Coefficient of variation (CV)0.1925503225
Kurtosis-0.0581198168
Mean44.79046685
Median Absolute Deviation (MAD)6.916652779
Skewness-0.02906388708
Sum365535
Variance74.38060028
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[16. 20.5 23.5 26.5 29.5 ... 60.5 63.5 67.5 71. 81. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
46 401 4.9%
 
45 382 4.7%
 
48 363 4.4%
 
47 355 4.3%
 
43 351 4.3%
 
41 336 4.1%
 
44 336 4.1%
 
42 333 4.1%
 
50 329 4.0%
 
40 317 3.9%
 
Other values (50) 4658 57.1%
 
ValueCountFrequency (%) 
16 5 0.1%
 
17 1 < 0.1%
 
18 3 < 0.1%
 
19 5 0.1%
 
20 3 < 0.1%
 
ValueCountFrequency (%) 
81 1 < 0.1%
 
80 1 < 0.1%
 
76 1 < 0.1%
 
73 3 < 0.1%
 
72 3 < 0.1%
 

HOMEKIDS
Real number (ℝ≥0)

ZEROS
Distinct count6
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7212351427521138
Minimum0
Maximum5
Zeros5289
Zeros (%)64.8%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.116323291
Coefficient of variation (CV)1.547793812
Kurtosis0.6510197811
Mean0.7212351428
Median Absolute Deviation (MAD)0.9348395221
Skewness1.341620234
Sum5886
Variance1.24617769
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5289 64.8%
 
2 1118 13.7%
 
1 902 11.1%
 
3 674 8.3%
 
4 164 2.0%
 
5 14 0.2%
 
ValueCountFrequency (%) 
0 5289 64.8%
 
1 902 11.1%
 
2 1118 13.7%
 
3 674 8.3%
 
4 164 2.0%
 
ValueCountFrequency (%) 
5 14 0.2%
 
4 164 2.0%
 
3 674 8.3%
 
2 1118 13.7%
 
1 902 11.1%
 

YOJ
Real number (ℝ≥0)

ZEROS
Distinct count21
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.527141281705674
Minimum0.0
Maximum23.0
Zeros625
Zeros (%)7.7%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q19
median11
Q313
95-th percentile15
Maximum23
Range23
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.978653799
Coefficient of variation (CV)0.377942472
Kurtosis1.453767399
Mean10.52714128
Median Absolute Deviation (MAD)2.888798207
Skewness-1.257713512
Sum85912
Variance15.82968605
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 2.5 4.5 5.5 ... 15.5 16.5 17.5 18.5 23. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
11 1457 17.9%
 
12 1158 14.2%
 
13 1016 12.4%
 
14 785 9.6%
 
10 749 9.2%
 
0 625 7.7%
 
9 521 6.4%
 
15 463 5.7%
 
8 384 4.7%
 
7 300 3.7%
 
Other values (11) 703 8.6%
 
ValueCountFrequency (%) 
0 625 7.7%
 
1 6 0.1%
 
2 15 0.2%
 
3 36 0.4%
 
4 37 0.5%
 
ValueCountFrequency (%) 
23 2 < 0.1%
 
19 12 0.1%
 
18 25 0.3%
 
17 101 1.2%
 
16 204 2.5%
 

TRAVTIME
Real number (ℝ≥0)

Distinct count97
Unique (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.48572478862884
Minimum5
Maximum142
Zeros0
Zeros (%)0.0%
Memory size63.9 KiB

Quantile statistics

Minimum5
5-th percentile7
Q122
median33
Q344
95-th percentile60
Maximum142
Range137
Interquartile range (IQR)22

Descriptive statistics

Standard deviation15.90833341
Coefficient of variation (CV)0.4750780672
Kurtosis0.6663746248
Mean33.48572479
Median Absolute Deviation (MAD)12.63117297
Skewness0.4469816868
Sum273277
Variance253.0750719
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 5. 5.5 8.5 15.5 20.5 ... 66.5 71.5 82.5 97.5 142. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
5 334 4.1%
 
35 219 2.7%
 
30 219 2.7%
 
32 214 2.6%
 
25 214 2.6%
 
36 211 2.6%
 
29 207 2.5%
 
33 206 2.5%
 
24 204 2.5%
 
37 202 2.5%
 
Other values (87) 5931 72.7%
 
ValueCountFrequency (%) 
5 334 4.1%
 
6 49 0.6%
 
7 43 0.5%
 
8 54 0.7%
 
9 70 0.9%
 
ValueCountFrequency (%) 
142 1 < 0.1%
 
134 1 < 0.1%
 
124 1 < 0.1%
 
113 1 < 0.1%
 
103 1 < 0.1%
 

BLUEBOOK
Real number (ℝ≥0)

Distinct count2789
Unique (%)34.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15709.899522117388
Minimum1500.0
Maximum69740.0
Zeros0
Zeros (%)0.0%
Memory size63.9 KiB

Quantile statistics

Minimum1500
5-th percentile4900
Q19280
median14440
Q320850
95-th percentile31110
Maximum69740
Range68240
Interquartile range (IQR)11570

Descriptive statistics

Standard deviation8419.734075
Coefficient of variation (CV)0.5359508547
Kurtosis0.7935064449
Mean15709.89952
Median Absolute Deviation (MAD)6732.170474
Skewness0.7945061471
Sum128208490
Variance70891921.9
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1500. 1510. 3310. 4490. 4505. ... 34840. 38925. 44430. 50575. 69740.], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1500 157 1.9%
 
6000 34 0.4%
 
6200 33 0.4%
 
5800 33 0.4%
 
6400 31 0.4%
 
5900 30 0.4%
 
6100 30 0.4%
 
6500 29 0.4%
 
5400 28 0.3%
 
5700 26 0.3%
 
Other values (2779) 7730 94.7%
 
ValueCountFrequency (%) 
1500 157 1.9%
 
1520 1 < 0.1%
 
1530 1 < 0.1%
 
1540 1 < 0.1%
 
1590 1 < 0.1%
 
ValueCountFrequency (%) 
69740 1 < 0.1%
 
65970 1 < 0.1%
 
62240 1 < 0.1%
 
61050 1 < 0.1%
 
57970 1 < 0.1%
 

TIF
Real number (ℝ≥0)

Distinct count23
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.35130498713393
Minimum1
Maximum25
Zeros0
Zeros (%)0.0%
Memory size63.9 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median4
Q37
95-th percentile13
Maximum25
Range24
Interquartile range (IQR)6

Descriptive statistics

Standard deviation4.146635309
Coefficient of variation (CV)0.7748830088
Kurtosis0.4243279295
Mean5.351304987
Median Absolute Deviation (MAD)3.366132908
Skewness0.891139559
Sum43672
Variance17.19458439
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 1. 1.5 2.5 3.5 4.5 ... 16.5 17.5 18.5 21.5 25. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 2533 31.0%
 
6 1341 16.4%
 
4 1242 15.2%
 
10 780 9.6%
 
7 620 7.6%
 
3 424 5.2%
 
13 278 3.4%
 
11 242 3.0%
 
9 225 2.8%
 
17 104 1.3%
 
Other values (13) 372 4.6%
 
ValueCountFrequency (%) 
1 2533 31.0%
 
2 6 0.1%
 
3 424 5.2%
 
4 1242 15.2%
 
5 52 0.6%
 
ValueCountFrequency (%) 
25 2 < 0.1%
 
22 3 < 0.1%
 
21 11 0.1%
 
20 8 0.1%
 
19 8 0.1%
 

OLDCLAIM
Real number (ℝ≥0)

ZEROS
Distinct count2857
Unique (%)35.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4037.0762161499815
Minimum0.0
Maximum57037.0
Zeros5009
Zeros (%)61.4%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q34636
95-th percentile27090
Maximum57037
Range57037
Interquartile range (IQR)4636

Descriptive statistics

Standard deviation8777.139104
Coefficient of variation (CV)2.174132623
Kurtosis9.870592126
Mean4037.076216
Median Absolute Deviation (MAD)5324.619579
Skewness3.12018688
Sum32946579
Variance77038170.86
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 251. 504. 950.5 1553.5 ... 11748.5 19563.5 41944. 49627. 57037. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5009 61.4%
 
4263 4 < 0.1%
 
1391 4 < 0.1%
 
1310 4 < 0.1%
 
3826 3 < 0.1%
 
2740 3 < 0.1%
 
4567 3 < 0.1%
 
3863 3 < 0.1%
 
4448 3 < 0.1%
 
8174 3 < 0.1%
 
Other values (2847) 3122 38.3%
 
ValueCountFrequency (%) 
0 5009 61.4%
 
502 1 < 0.1%
 
506 1 < 0.1%
 
518 1 < 0.1%
 
519 1 < 0.1%
 
ValueCountFrequency (%) 
57037 1 < 0.1%
 
53986 1 < 0.1%
 
53568 1 < 0.1%
 
53477 1 < 0.1%
 
52507 1 < 0.1%
 

CLM_FREQ
Real number (ℝ≥0)

ZEROS
Distinct count6
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7985540987624066
Minimum0
Maximum5
Zeros5009
Zeros (%)61.4%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q32
95-th percentile3
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.158452681
Coefficient of variation (CV)1.450687791
Kurtosis0.2860042955
Mean0.7985540988
Median Absolute Deviation (MAD)0.9802616054
Skewness1.209242991
Sum6517
Variance1.342012615
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[0. 1.5 2.5 3.5 4.5 5. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 5009 61.4%
 
2 1171 14.3%
 
1 997 12.2%
 
3 776 9.5%
 
4 190 2.3%
 
5 18 0.2%
 
ValueCountFrequency (%) 
0 5009 61.4%
 
1 997 12.2%
 
2 1171 14.3%
 
3 776 9.5%
 
4 190 2.3%
 
ValueCountFrequency (%) 
5 18 0.2%
 
4 190 2.3%
 
3 776 9.5%
 
2 1171 14.3%
 
1 997 12.2%
 

MVR_PTS
Real number (ℝ≥0)

ZEROS
Distinct count13
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.6955030020830781
Minimum0
Maximum13
Zeros3712
Zeros (%)45.5%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q33
95-th percentile6
Maximum13
Range13
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.147111744
Coefficient of variation (CV)1.266356793
Kurtosis1.378141796
Mean1.695503002
Median Absolute Deviation (MAD)1.739591745
Skewness1.348335868
Sum13837
Variance4.610088843
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.5 1.5 2.5 3.5 ... 6.5 7.5 8.5 9.5 13. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 3712 45.5%
 
1 1157 14.2%
 
2 948 11.6%
 
3 758 9.3%
 
4 599 7.3%
 
5 399 4.9%
 
6 266 3.3%
 
7 167 2.0%
 
8 84 1.0%
 
9 45 0.6%
 
Other values (3) 26 0.3%
 
ValueCountFrequency (%) 
0 3712 45.5%
 
1 1157 14.2%
 
2 948 11.6%
 
3 758 9.3%
 
4 599 7.3%
 
ValueCountFrequency (%) 
13 2 < 0.1%
 
11 11 0.1%
 
10 13 0.2%
 
9 45 0.6%
 
8 84 1.0%
 

CAR_AGE
Real number (ℝ)

Distinct count30
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8.30780541600294
Minimum-3.0
Maximum28.0
Zeros3
Zeros (%)< 0.1%
Memory size63.9 KiB

Quantile statistics

Minimum-3
5-th percentile1
Q14
median8
Q312
95-th percentile18
Maximum28
Range31
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.520292254
Coefficient of variation (CV)0.6644705765
Kurtosis-0.5945111857
Mean8.307805416
Median Absolute Deviation (MAD)4.453047173
Skewness0.3023587845
Sum67800
Variance30.47362657
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[-3. 0.5 1.5 2.5 3.5 ... 20.5 21.5 23.5 25.5 28. ], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
1 1934 23.7%
 
8 1047 12.8%
 
9 526 6.4%
 
7 524 6.4%
 
10 469 5.7%
 
11 460 5.6%
 
6 451 5.5%
 
12 368 4.5%
 
13 356 4.4%
 
14 311 3.8%
 
Other values (20) 1715 21.0%
 
ValueCountFrequency (%) 
-3 1 < 0.1%
 
0 3 < 0.1%
 
1 1934 23.7%
 
2 12 0.1%
 
3 54 0.7%
 
ValueCountFrequency (%) 
28 1 < 0.1%
 
27 1 < 0.1%
 
26 2 < 0.1%
 
25 6 0.1%
 
24 10 0.1%
 

LOG_INCOME
Real number (ℝ≥0)

ZEROS
Distinct count6612
Unique (%)81.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10.014746199093
Minimum0.0
Maximum12.81320159213428
Zeros615
Zeros (%)7.5%
Memory size63.9 KiB

Quantile statistics

Minimum0
5-th percentile0
Q110.29917165
median10.89727622
Q311.33026385
95-th percentile11.92191768
Maximum12.81320159
Range12.81320159
Interquartile range (IQR)1.031092202

Descriptive statistics

Standard deviation2.984950386
Coefficient of variation (CV)0.29805552
Kurtosis6.63399289
Mean10.0147462
Median Absolute Deviation (MAD)1.716057401
Skewness-2.811999957
Sum81730.34373
Variance8.909928808
Histogram with fixed size bins (bins=10)
Histogram with variable size bins (bins=[ 0. 0.89587973 4.90483563 7.44511313 8.29566023 ... 11.69953363 11.94057579 12.21452923 12.40983895 12.81320159], "bayesian blocks" binning strategy used)
ValueCountFrequency (%) 
0 615 7.5%
 
10.89727622 447 5.5%
 
10.19768585 4 < 0.1%
 
11.031513 4 < 0.1%
 
10.78952524 4 < 0.1%
 
11.31932858 3 < 0.1%
 
10.82311272 3 < 0.1%
 
10.76877968 3 < 0.1%
 
11.87998926 3 < 0.1%
 
11.29207927 3 < 0.1%
 
Other values (6602) 7072 86.7%
 
ValueCountFrequency (%) 
0 615 7.5%
 
1.791759469 1 < 0.1%
 
2.079441542 1 < 0.1%
 
2.944438979 1 < 0.1%
 
4.262679877 1 < 0.1%
 
ValueCountFrequency (%) 
12.81320159 1 < 0.1%
 
12.71391382 1 < 0.1%
 
12.67647619 1 < 0.1%
 
12.64313009 1 < 0.1%
 
12.63224847 1 < 0.1%
 

PARENT1_No
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
7084
0
 
1077
ValueCountFrequency (%) 
1 7084 86.8%
 
0 1077 13.2%
 

PARENT1_Yes
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7084
1
 
1077
ValueCountFrequency (%) 
0 7084 86.8%
 
1 1077 13.2%
 

MSTATUS_No
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
4894
1
3267
ValueCountFrequency (%) 
0 4894 60.0%
 
1 3267 40.0%
 

MSTATUS_Yes
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
4894
0
3267
ValueCountFrequency (%) 
1 4894 60.0%
 
0 3267 40.0%
 

SEX_F
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
4375
0
3786
ValueCountFrequency (%) 
1 4375 53.6%
 
0 3786 46.4%
 

SEX_M
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
4375
1
3786
ValueCountFrequency (%) 
0 4375 53.6%
 
1 3786 46.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
5919
1
2242
ValueCountFrequency (%) 
0 5919 72.5%
 
1 2242 27.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
5831
1
2330
ValueCountFrequency (%) 
0 5831 71.4%
 
1 2330 28.6%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6503
1
1658
ValueCountFrequency (%) 
0 6503 79.7%
 
1 1658 20.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7433
1
 
728
ValueCountFrequency (%) 
0 7433 91.1%
 
1 728 8.9%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6958
1
 
1203
ValueCountFrequency (%) 
0 6958 85.3%
 
1 1203 14.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6336
1
1825
ValueCountFrequency (%) 
0 6336 77.6%
 
1 1825 22.4%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6890
1
 
1271
ValueCountFrequency (%) 
0 6890 84.4%
 
1 1271 15.6%
 

JOB_Doctor
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7915
1
 
246
ValueCountFrequency (%) 
0 7915 97.0%
 
1 246 3.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7520
1
 
641
ValueCountFrequency (%) 
0 7520 92.1%
 
1 641 7.9%
 

JOB_Lawyer
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7326
1
 
835
ValueCountFrequency (%) 
0 7326 89.8%
 
1 835 10.2%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7173
1
 
988
ValueCountFrequency (%) 
0 7173 87.9%
 
1 988 12.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7044
1
 
1117
ValueCountFrequency (%) 
0 7044 86.3%
 
1 1117 13.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7449
1
 
712
ValueCountFrequency (%) 
0 7449 91.3%
 
1 712 8.7%
 

CAR_USE_Commercial
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
5132
1
3029
ValueCountFrequency (%) 
0 5132 62.9%
 
1 3029 37.1%
 

CAR_USE_Private
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
5132
0
3029
ValueCountFrequency (%) 
1 5132 62.9%
 
0 3029 37.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6016
1
2145
ValueCountFrequency (%) 
0 6016 73.7%
 
1 2145 26.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7485
1
 
676
ValueCountFrequency (%) 
0 7485 91.7%
 
1 676 8.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6772
1
1389
ValueCountFrequency (%) 
0 6772 83.0%
 
1 1389 17.0%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
5867
1
2294
ValueCountFrequency (%) 
0 5867 71.9%
 
1 2294 28.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7254
1
 
907
ValueCountFrequency (%) 
0 7254 88.9%
 
1 907 11.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7411
1
 
750
ValueCountFrequency (%) 
0 7411 90.8%
 
1 750 9.2%
 

REVOKED_No
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
7161
0
 
1000
ValueCountFrequency (%) 
1 7161 87.7%
 
0 1000 12.3%
 

REVOKED_Yes
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
7161
1
 
1000
ValueCountFrequency (%) 
0 7161 87.7%
 
1 1000 12.3%
 

URBANICITY_Highly Rural/ Rural
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
0
6492
1
1669
ValueCountFrequency (%) 
0 6492 79.5%
 
1 1669 20.5%
 

URBANICITY_Highly Urban/ Urban
Boolean

HIGH CORRELATION
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size8.1 KiB
1
6492
0
1669
ValueCountFrequency (%) 
1 6492 79.5%
 
0 1669 20.5%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Missing values

Sample

First rows

TARGET_FLAGKIDSDRIVAGEHOMEKIDSYOJTRAVTIMEBLUEBOOKTIFOLDCLAIMCLM_FREQMVR_PTSCAR_AGELOG_INCOMEPARENT1_NoPARENT1_YesMSTATUS_NoMSTATUS_YesSEX_FSEX_MEDUCATION_BachelorsEDUCATION_High SchoolEDUCATION_MastersEDUCATION_PhDEDUCATION_lower than High SchoolJOB_Blue CollarJOB_ClericalJOB_DoctorJOB_Home MakerJOB_LawyerJOB_ManagerJOB_ProfessionalJOB_StudentCAR_USE_CommercialCAR_USE_PrivateCAR_TYPE_MinivanCAR_TYPE_Panel TruckCAR_TYPE_PickupCAR_TYPE_SUVCAR_TYPE_Sports CarCAR_TYPE_VanREVOKED_NoREVOKED_YesURBANICITY_Highly Rural/ RuralURBANICITY_Highly Urban/ Urban
00060.0011.01414230.0114461.02318.011.1176581010010001000000010011000001001
10043.0011.02214940.010.0001.011.4235481010010100010000000101000001001
20035.0110.054010.0438690.02310.09.6828411001100100001000000010001001001
30051.0014.03215440.070.0006.010.8972761001010000110000000011000001001
40050.0011.03618000.0119217.02317.011.6525741001100001000100000010001000101
51034.0112.04617430.010.0007.011.7384820110101000010000000100000101001
60054.0011.0338780.010.0001.09.8392691001100000110000000010001001001
71137.0211.04416970.012374.01107.011.5895351001011000010000000100000010101
81034.0010.03411200.010.0001.011.0505571010101000001000000010001001001
90050.007.04818510.070.00117.011.5801451010011000000000010100000011010

Last rows

TARGET_FLAGKIDSDRIVAGEHOMEKIDSYOJTRAVTIMEBLUEBOOKTIFOLDCLAIMCLM_FREQMVR_PTSCAR_AGELOG_INCOMEPARENT1_NoPARENT1_YesMSTATUS_NoMSTATUS_YesSEX_FSEX_MEDUCATION_BachelorsEDUCATION_High SchoolEDUCATION_MastersEDUCATION_PhDEDUCATION_lower than High SchoolJOB_Blue CollarJOB_ClericalJOB_DoctorJOB_Home MakerJOB_LawyerJOB_ManagerJOB_ProfessionalJOB_StudentCAR_USE_CommercialCAR_USE_PrivateCAR_TYPE_MinivanCAR_TYPE_Panel TruckCAR_TYPE_PickupCAR_TYPE_SUVCAR_TYPE_Sports CarCAR_TYPE_VanREVOKED_NoREVOKED_YesURBANICITY_Highly Rural/ RuralURBANICITY_Highly Urban/ Urban
81510054.0013.01819660.0124690.0164.011.3122651001011000000000100100000010101
81520146.0012.02615060.0433026.0301.010.7148401010010100010000000011000001010
81530048.0010.05917430.0130.00418.011.6200381010100001000100000010001001001
81540138.0416.01524740.019245.03315.09.4507741001101000000000001100010001001
81550041.007.0415600.010.0007.08.7414561010010100000000001010010001010
81560035.0011.05127330.0100.0008.010.6715801010010100010000000100100001010
81570145.029.02113270.0150.00217.012.0116991001010001000000100011000001001
81580046.009.03624490.060.0001.011.5824981001010010000000000100100001001
81590050.007.03622550.060.00011.010.6792741001101000000010000011000001001
81600052.0011.06419400.060.0009.010.8824901001100100001000000011000001010